A steepest descent calculation of RNA pseudoknots 
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We enumerate possible topologies of pseudoknots in single-stranded RNA molecules. We use a 
steepest-descent approximation in the large N matrix field theory, and a Feynman diagram formalism 
to describe the resulting pseudoknot structure. 



An RNA molecule is a heteropolymer strand made up 
of four types of nucleotides, uracil [U), adenine {A), 
guanine (G), and cytosine (C). The sequence of these 
nucleotides, or bases, makes up the molecule's primary 
structure. Bases form hydrogen bonds with each other 
to give the molecule a stable shape in three dimensions, 
with U bonding to A, and C to G. Calculating the shape 
a given primary structure will fold into is important in 
molecular biology. 

We can associate —Uij with the energy of forming a 
hydrogen bond between the ith and jth bases, and let 
Vij = exp{Uij /T) where T is the temperature. This is 
a minimalist model: we make no attempt to account for 
loop penalties or stacking interactions. There is some 
rigidity in the chain of nucleotides, as well as steric con- 
straints, which prevent hydrogen bonding between nu- 
cleotides that are within four bases of each other, so we 
let Vi^i+k = if fc < 4. The partition function associated 
with this bonding is given by 



i<j i<j<k<l i<j<k<l 
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Evidently, Z^^i is the combinatorial heart of the RNA 
folding problem 0|. While Z^ i appears very simple at 
first glance, it contains a term for every possible config- 
uration of bonds on the chain. Finding the folded state 
could involve searching through ^ L\ terms, which is a 
daunting task for even the shortest RNAs. 

Fortunately, in RNA, there is a hierarchical separation 
between primary, secondary and tertiary structures that 
reduces the number of configurations that must be con- 
sidered. One can find the secondary structure by draw- 
ing the chain of nucleotides around the circumference of 
a circle, with the first nucleotide next to the last, and 
finding a bond structure that minimizes the free energy 
with the constraints that all bonds are drawn as arcs 
within the circle, and no bonds cross. Another repre- 
sentation is to draw the bond structures as systems of 
parallel arches which do not cross. This planar configu- 
ration (in the sense used in 0, Q , though other usages 
are common in the RNA folding literature) is made up of 
the secondary structure's characteristic loops and bulges. 
Bonds between distinct parts of the secondary structure 



are called pseudoknots, and are typically considered part 
of the molecule's tertiary structure. For instance, the 
contributions from the third sum in ^ come from pseudo- 
knot configurations. The formation of the tertiary struc- 
ture is believed not to alter the more stable secondary 
structure [Hl|. 

Secondary and tertiary structures are usually stable at 
biological temperatures, which are typically well below 
the RNA molecule's melting point. This makes certain 
very efficient algorithms for determining RNA secondary 
structure at zero temperature possible and useful. These 
"dynamic programming" methods involve recursively cal- 
culating Zl i and then backtracking to find the dominant 
terms, and thus determine which bonds arc present in the 
folded RNA. There are also dynamic programming tech- 
niques that try to account for pseudoknots, but they are 
necessarily slower IE 0- 

The distinction between secondary and pseudoknot 
structure has a topological flavor. One powerful tool for 
dealing with topological considerations is the large ex- 
pansion used in matrix field theories. Originally proposed 
by 't Hooft to represent quantum chromodynamics with 
N colors, it predicts that non-planar Feynman diagrams 
have amplitudes proportional to negative powers of N , 
and are thus suppressed when N is large|i3„ Two of 
the authors applied a similar technique to the problem 
of RNA folding, leading to the same sort of suppression 
of non-planar configurations; we summarize the results 
below, and refer the reader to for details. 

One can perform a series of manipulations to find that 
a chain of L bases has 



Z 



C 



(2) 

where the integral is taken over all Hermitian (L + 1) x 
[L -\- 1) matrices A. C is an unimportant normalization 
constant and M is a matrix function of A given by 



Mij = 5i.j - + i\JVi-i^j 



(3) 



Here, is a used to keep track of topology, as mentioned 
above. Thus we can expand in powers of 1/N and evalu- 
ate the integral by steepest descent. We need to find the 
stationary point of the "action" 
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S{A) = -ixA^ -It logM(A) 



(4) 



which requires solving ^£Ld) 



0. This occurs at the 



point which is defined by 



Aik^i\fWk{M-^)iM^^ 



(5) 



We define a new matrix in terms of M ^ at the sta- 
tionary point, 



segment is itself dressed by secondary structure elements. 



Equation © can be represented pictorially, as in fig. 1(a) 
There are four indices on Ay ^; , indicating where the seg- 
ments begin and end, so we call it a "gluon propagator". 
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(a) Bethe-Salpeter relation 



(6) 



and use the trivial identity Mij{M ^)jk — 5ik to de- 
rive the Hartree equation 



(7) 



This equation is recursive, and we need to impose the 
boundary condition that Gi^i+i = for Z > 2 to solve 
it. Then, Gij is the partition function of the heli- 
cal secondary structure of a chain that starts with the 
jth base and ends with the ith base. This form is 
precisely that used in existing dynamic programming 
algorithms 0, 0, @. Since it carries two indices, Gij 
is analogous to the quark propagator in large N QCD, 
which carries two indices for color. The recursion relation 
ensures that it is a "dressed" propagator. 

We can then introduce the fluctuation Xij , defined by 
Aij = Aij + Xij/VN, and expand tr log(M~^(A)) and 
M~'^{A) as power series in x. Then we collect powers of 
to find corrections to the steepest descent approx- 
imation of Z^ i. We are left with Gaussian integrals in 
Xij that can be evaluated by applying Wick's theorem, 
with contractions given by the inverse of the quadratic 
form in the exponential. This inverse is a propagator 
which satisfies the Bethe-Salpeter equation 



^kljmn — ^km^nl 



iGj-ij+iAi 



(8) 



While the Hartree equation gave the partition function 
for a single contiguous chain of RNA interacting with 
itself, the Bethe-Salpeter relation gives the contribution 
from two separated segments. 

Physically, A represents the resummation of all lad- 
der diagrams between anti-parallel segments, where each 




(b) Sample structure 
contributing to A. 



FIG. 1: The A propagator. 



in analogy to gluon propagators in QCD, which carry four 
color indices. A typical structure contributing to A is 
shown in fig. 1(b) Each single line propagator is dressed 
by any system of arches. There can be any number of 
parallel interactions between the two strands. The only 
constraint is that no interaction lines should cross. Note 
that a system of arches along one line (RNA strand) typ- 
ically represents a piece of helix on this strand, whereas a 
system of parallel interactions between the two lines can 
represent a helical fragment between the two strands. 



There are two ways of drawing Feynman diagrams for 
these propagators. The first was introduced in jy, and is 
useful for visualizing the RNA's topology. The second 
is the double- line formalism of 't Hooft, which makes 
it very easy to find a graph's order in 1/A^, by assign- 
ing appropriate powers powers of N to loops, edges and 
vertices 111 . It follows from © that the A propagator 
contains powers of V^^ , but the partition function 
contains only whole powers of Vij . Thus, all A's in the ex- 

rl/2 



pansion appear with factors of V^^^^ , as F-^'^Ay^fc; V^,; 
This is reflected in the diagrams in fig. [3 We can then 
expand Z^ i to order iV~^, getting the secondary struc- 
ture as well as the tertiary correction to it. Then 



.1/2, 
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-B2T4 — — T^lT^ 

4 3 5 



1 

12' 



—B^nTi + —B^Ti - —BiTi ] Al- 



ls 



162 



(9) 
(10) 



L+1,1 / 
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where we use the value of AI~^ at the stationary point 
from H5I()|I . We have also introduced some convenient 
shorthand for matrices and traces that contain powers of 



X, 



mm' ^m'n 



kl 



Tr, 



tr Br, 



The angle brackets in H1Q(I mean the included terms 
should be integrated over Xij with the Gaussian weight 
exp[— (tra;^ + tr(M~^c)^)/2]. These integrals are sim- 
ple in principle, as the Xij 's can be contracted with the 
Bethe-Salpeter propagator (jHJ. Each power of x intro- 
duces a vertex for gluon lines. 

The multiplication implicit in the definition of Bp is 
matrix multiplication, so many indices must be summed 
over when evaluating the terms in (|l()|l . For instance, 
evaluating one of the contractions of (B^AI^^) produces 
the sum. 



BiAr 



' L+1,1 



E 



GLA+lGj,k+l 



X Gl,ra+lGn,o+lGp_i 

^ ^ i,n+l'-^i-.n+l,j + l,m ^ j+l,m 

^ '^k!p+l^k,p+l,l + l,oVl'_llj^ (11) 

Looking at the diagram associated in the contraction in 
fig. 121 and using the condition that Ga,a+b = for 6 > 2, 

"Quark" propagator dj 



"Gluon" propagator V^/^ Aij^kiV^/^ 



3 I 
Diagrams used 
in this paper 




't Hooft diagrams 
FIG. 2: Propagators. 



we deduce the proper constraint for the indices, L > i > 
j>k>l>m>n>o>p>0. 




FIG. 3: Diagram for ( \B4M 



The B,n and r„ terms have simple 't Hooft diagrams, 
as shown in fig. ^ The ellipses in the diagram represent 
the string of m or n gluon vertices associated with those 
terms. The graph for T„ closes on itself, refiecting the 
trace's cyclic symmetry. 



(a) Bm 



(b) T„ 



FIG. 4: Matrix products 

These diagrams make it simple to pick out the Wick 
contractions that actually contribute to Z^ i. One can 
draw Feynman diagrams for the contractions of the 7 
terms in (jlOII , and find that 25 of them are distinct (many 
contractions are equivalent under the cyclic symmetry of 
the traces T!„). However, most of these vanish, as they 
contain closed G loops. Diagrams involving closed loops 
will depend on a factor of Gi^i+i for I > 2, and therefore 
vanish. This can also be understood in terms of the di- 
agrams from where G's represent segments of RNA, 
and A's represent interactions between two segments. A 
closed G loop with both ends connected to the same side 
of a A propagator describes a closed loop of RNA in- 
teracting with the main strand. We have specifically ex- 
cluded this possibility from our definition of so such 
configurations must vanish. This is the reason why there 
is no graph of order 1/A^ in 

As an example, consider {B^T^A/[~^) , which can be 
contracted in the three distinct ways shown in figs. [Sfa), 
(b) and (c). Each of these occurs with a symmetry factor 
of 3, since an Xij from the B^ can be contracted with any 
of the (cyclically equivalent) Xm/'s in T3. Only the dia- 
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(b) (c) 
FIG. 5: Contractions for {BiT-ilvr^) 



they cannot be disconnected by opening a single quark 
line), and can thus be re-summed by a Dyson equation. 
Define Smn as the sum of all the amputated pseudoknot 
diagrams defined above (i.e. the sum of all 0{N~'^) IPI 
diagrams with their external G propagators removed). 
Then the partition function Zmn satisfies the usual Dyson 
equation: 



gram in fig. EJa) can be traced with an unbroken line — 
the other diagrams contain closed loops. Thus, only one 
of the three sorts of contractions contributes to the par- 
tition function. 

When all the contractions have been carried out, there 
remain 8 non- vanishing graphs, which are shown in fig. 
|S| The contractions associated with each diagram are 



Figure 


Contraction 


Pseudoknot 


(a) 




ABAB 


(b) 


B2TiM-^ 


ABACBC 


(c) 




ABCABC 


(d) 




ABCBCA 


(e) 




ABCBDCDA 


(f) 


BiTsTiM-^ 


ABCDBCDA 


(g) 




ABCADBCD 


(h) 


BiT^M-^ 


ABCDBECDEA 



The alphabetic notation, common in the biochemical lit- 
erature, shows the order in which sites pair with each 
other. For example, "ABAB" indicates that the first and 
third vertices (both denoted by "A" ) are paired, and that 
the vertex between them is linked to the fourth vertex 
(both denoted by "B"). 

Since the pseudoknots we consider contribute to order 
only one pseudoknot may be present at a time. 
This problem can be solved by noting that all the pseu- 
doknot diagrams are one particle irreducible (IPI, i.e. 



G 

rnn + 2^ Zmk^klGln (12) 
m < < / < n 

Once the 8 diagrams for S have been calculated, the full 
partition function (with any number of pseudoknots) can 
be calculated using the above recursion relations. Present 
knot-prediction algorithms use dynamic programming al- 
low knots which have bonds drawn inside and outside of 
the disc, as long as they are no crossings 0. This ex- 
cludes certain topologies our algorithm provides for, like 
ABCABC pseudoknots. On the other hand, these al- 
gorithms do provide for some topologies that we've ex- 
cluded as 0(7V-4). 

The method presented allows us to calculate the parti- 
tion function in 0{L^) time, so it can be used for folding, 
by backtracking to pick out the largest term in the parti- 
tion function. The strategy for doing so is the following: 
i) solve for the Hartree partition function {Tj) , ii) solve the 
Bethe-Salpeter recursion equation to get AkLmn, ih) 
calculate the eight amputated diagrams of fig. 5 making 
up the IPI function Smn, iv) solve the Dyson equation 
(|12|l by recursion to obtain the full partition function 
with any number of pseudoknots, v) and then backtrack 
to find the largest term in this partition function. 

Some numerical calculations are under way and we 
hope to present those results in a future paper, along 
with an explicit calculation for the order iV^^ folding of 
a short (L ~ 10) RNA. 



[1] H. Orland and A. Zee, Nucl. Phys. B [FS] 620 (2002) 
456-476. 

[2] R. Nussinov and A.B. Jacobson, PNAS 77 (1980) 6309. 
[3] S. Coleman, Aspects of Symmetry (Cambridge University 

Press, New York, 1985), Ch. 9. 
[4] P.G. Higgs, Quarterly Reviews in Biophysics 33 (2000) 

199. 

[5] I. Tinoco Jr. and C. Bustamante, J. Mol. Biol. 293 (1999) 
271. 

[6] M.S. Waterman and T.F. Smith, Adv. Applied Maths. 7 
(1986) 

[7] E. Rivas and S.R. Eddy, J. Mol. Biol. 285 (1999) 2053. 
[8] G. 't Hooft, Nucl. Phys. B 72(1974) 461. 
[9] M. Zuker, Science 244 48. 



[10] D.K. Lubensky and D.R. Nelson, Phys. Rev. Lett. 85 
(2000) 1572. 

[11] I.L. Hofacker, W. Fontana, P.P. Stadler, L.S. Bonhoeffer, 
M. Packer and P. Schuster, Monatshefte fur Chemie 125 
(1994) 167. 

[12] A. Montanari and M. Mezard, Phys. Rev. Lett. 86 (2001) 
2178. 

[13] R. Bundschuh and T. Hwa, Phys. Rev. Lett. 83 (1999) 
1479. 

[14] J.S. McCaskill, Biopolymers 29 (1990) 1105. 
[15] H. Zhou, Y. Zhang and Z-C. Ou-Yang, Phys. Rev. Lett. 
86 (2001) 356. 



(b) (c) BsTaM-^ (d) BiTbM-i (e) B^TzTiM 

S2T4M-1 




(f) BiTzTiM-^ (g) S2T|M-i (h) SiT|M-i 

FIG. 6: Non-vanishing contractions 



